-
Notifications
You must be signed in to change notification settings - Fork 3k
Doc: Update Files metadata table #3422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @samredai as the docs refactor is underway 👍 |
site/docs/spark-queries.md
Outdated
| | s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null | [4] | | ||
| | s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null | [4] | | ||
| +-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+------------------+-----------------+-----------------+--------------+---------------+ | ||
| +-------+-------------------------------------------------------------------------+-----------+---------------+------------+------------------+---------------------------+------------------------+------------------------+----------------+---------------------------------------+---------------------------------------+------------+-------------+------------+-------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to not revert the formatting changes? I think it is less readable with the initial space removed.
Maybe we should replace this with a real HTML table?
Here's a snippet of code that we use in our notebooks to format PySpark dataframes as nicer tables:
from prettytable import PrettyTable
from IPython.core.magic import register_line_cell_magic
class DFTable(PrettyTable):
def __repr__(self):
return self.get_string()
def _repr_html_(self):
return self.get_html_string()
def to_table(df, num_rows=100):
cols = df.columns
t = DFTable()
t.field_names = cols
t.align = "r"
for row in df.limit(num_rows).collect():
d = row.asDict()
t.add_row([ d[col] for col in cols ])
return tThat will produce both HTML and text tables that have reasonable formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, let me look at this.
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat unrelated, but I noticed that lower_bounds has no value for some keys. Is this possibly confusing for readers of the docs?
[1 -> , 2 -> c]
Notice that 1 doesn't point to anything. Is this intentional and do we think this might confuse readers? I can open a separate issue if so.
|
@kbendick, I think that we should convert lower and upper bounds into human-readable strings. Right now, I think we pass them to Spark as a map of id to binary. |
|
@rdblue @kbendick @samredai i made a markdown table for files, if you guys click 'View File' in github it should show what it looks like. If you want i can extend this to the other ones too, or do it separately. By the way i also found a possible problem in the spec_id column added in #3015 , it should probably get hidden just like partition column if table is unpartitioned, I could fix that in another PR. |
|
+1 to adding the scroll bar. Thanks @samredai! |
|
The markdown table looks good to me. The only problem is that now just this table is markdown. Anyone want to follow up with an update for the other tables? |
Files metadata table was missing some columns, and also missing an example for partitioned tables (users may want to know how to check what data files they have for a given partition)
|
|
|
@KnightChess I'm not sure to be honest, can you put up your change on your pr? We can continue discussion on that pr instead of this one? |
|
@szehon-ho can you update the CSS to the below? If we're using this for all markdown tables, .markdown-table-container {
width: 780px;
overflow-x: auto;
} |
|
Done, thanks @KnightChess for finding the issue that would occur for the other table, and @samredai for figuring it out |
|
@szehon-ho yes the plan is to keep the markdown files in this repo. There are some changes coming soon but any merged docs PR will be included. This looks good to me, I'll let @rdblue comment on if it's good to merge. |
|
Merged. Thanks, @szehon-ho! |
|
Thanks for fast response ! |
* apache/iceberg#3723 * apache/iceberg#3732 * apache/iceberg#3749 * apache/iceberg#3766 * apache/iceberg#3787 * apache/iceberg#3796 * apache/iceberg#3809 * apache/iceberg#3820 * apache/iceberg#3878 * apache/iceberg#3890 * apache/iceberg#3892 * apache/iceberg#3944 * apache/iceberg#3976 * apache/iceberg#3993 * apache/iceberg#3996 * apache/iceberg#4008 * apache/iceberg#3758 and 3856 * apache/iceberg#3761 * apache/iceberg#2062 * apache/iceberg#3422 * remove restriction related to legacy parquet file list




Files metadata table was missing some columns, and also missing an example for partitioned tables in which the new partition column is added (users may want to know how to check what data files they have for a given partition)